Very Predictive Ngrams for Space-Limited Probabilistic Models
نویسندگان
چکیده
In sequential prediction tasks, one repeatedly tries to predict the next element in a sequence. A classical way to solve these problems is to fit an order-n Markov model to the data, but fixed-order models are often bigger than they need to be. In a fixed-order model, all predictors are of length n, even if a shorter predictor would work just as well. We present a greedy algorithm, vpr, for finding variable-length predictive rules. Although vpr is not optimal, we show that on English text, it performs similarly to fixed-order models but uses fewer parameters.
منابع مشابه
Accounting ngrams and multi-word terms can improve topic models
The paper presents an empirical study of integrating ngrams and multi-word terms into topic models, while maintaining similarities between them and words based on their component structure. First, we adapt the PLSA-SIM algorithm to the more widespread LDA model and ngrams. Then we propose a novel algorithm LDA-ITER that allows the incorporation of the most suitable ngrams into topic models. The...
متن کاملمدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره
In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...
متن کاملPerformance Evaluation of Dynamic Modulus Predictive Models for Asphalt Mixtures
Dynamic modulus characterizes the viscoelastic behavior of asphalt materials and is the most important input parameter for design and rehabilitation of flexible pavements using Mechanistic–Empirical Pavement Design Guide (MEPDG). Laboratory determination of dynamic modulus is very expensive and time consuming. To overcome this challenge, several predictive models were developed to determine dyn...
متن کاملFactored Models for Probabilistic Modal Logic
Modal logic represents knowledge that agents have about other agents’ knowledge. Probabilistic modal logic further captures probabilistic beliefs about probabilistic beliefs. Models in those logics are useful for understanding and decision making in conversations, bargaining situations, and competitions. Unfortunately, probabilistic modal structures are impractical for large real-world applicat...
متن کاملOccurrence Based Statistics in Machine Translation
As MT approaches demand longer context for better translation quality, the limitations of current language modeling techniques become explicit. The computational inability to model the likelihood of longer ngrams and the likelihood of their usage in probabilistic manner, have prevented us from exploring long ngrams in MT. In this paper, we propose and investigate a new set of features called oc...
متن کامل